Probabilistic suffix models for API sequence analysis of Windows XP applications
نویسندگان
چکیده
Given the pervasive nature of malicious mobile code (viruses, worms, etc.), developing statistical/structural models of code execution is of considerable importance. We investigate using probabilistic suffix trees (PSTs) and associated suffix automata (PSAs) to build models of benign application behavior with the goal of subsequently being able to detect malicious applications as anything that deviates therefrom. We describe these probabilistic suffix models and present new generic analysis and manipulation algorithms. The models and the algorithms are then used in the context of API (i.e., system call) sequences realized by Windows XP applications. The analysis algorithms, when applied to traces (i.e., sequences of API calls) of benign and malicious applications, aid in choosing an appropriate modeling strategy in terms of distance metrics and consequently provide classification measures in terms of sequence-to-model matching. We give experimental results based on classification of unobserved traces of benign and malicious applications against a suffix model trained solely from traces generated by a small set of benign applications. 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
منابع مشابه
Average Size of a Suffix Tree for Markov Sources
We study a suffix tree built from a sequence generated by a Markovian source. Such sources are more realistic probabilistic models for text generation, data compression, molecular applications, and so forth. We prove that the average size of such a suffix tree is asymptotically equivalent to the average size of a trie built over n independent sequences from the same Markovian source. This equiv...
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملImplementing Boot Control for Windows Vista
A digital forensic logging system must prevent the booting of unauthorized programs and the modification of evidence. Our previous research developed Dig-Force2, a boot control system for Windows XP platforms that employs API hooking and a trusted platform module. However, Dig-Force2 cannot be used for Windows Vista systems because the hooked API cannot monitor booting programs in user accounts...
متن کاملAppl Bioinformatics 2004; 3 (2-3): 193-200
Statistical analysis of amino acid and nucleotide sequences, especially sequence alignment, is one of Abstract the most commonly performed tasks in modern molecular biology. However, for many tasks in bioinformatics, the requirement for the features in an alignment to be consecutive is restrictive and ‘n-grams’ (aka k-tuples) have been used as features instead. N-grams are usually short nucleot...
متن کاملDetecting Malicious Behaviors of Software through Analysis of API Sequence k-gramsi
Nowadays, software is widely applied to increase accuracy, efficiency, and convenience in various areas in our life. So, it is essential to use software in our recent computing environments. Despite of the valuable applications of software, malicious behaviors caused by vulnerability of software threaten our secure computing environments. So, it is important to identify and detect malicious beh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 41 شماره
صفحات -
تاریخ انتشار 2008